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(—5 Hyper-parameter estimation of Gaussian processes is analyzed in an asymptotic framework. 

The spatial sampling is a randomly perturbed regular grid and its deviation from the perfect 
regular grid is controlled by a single scalar regularity parameter. Consistency and asymp- 
totic normality are proved for the Maximum Likelihood and Cross Validation estimators of the 
hyper-parameters. The asymptotic covariance matrices of the hyper-parameter estimators are 
deterministic functions of the regularity parameter. By means of an exhaustive study of the 
asymptotic covariance matrices, it is shown that irregular sampling is generally an advantage 
to estimation, but we identify cases where it is not the case. Therefore, a negative answer is 
given to the claim that irregular sampling is always better for hyper-parameter estimation than 
regular sampling. 

Keywords: Uncertainty quantification, metamodel, Kriging, hyper-parameter estimation, 
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1. Introduction 

In many areas of science that involve measurements or data acquisition, one often has to 
answer the question of how the set of experiments should be designed |13| . It is known that 
£C) in many situations, an irregular, or even random, spatial sampling is preferable to a regular 

t-H one. Examples of these situations are found in many fields. For numerical integration, Gaussian 

quadrature rules generally yield irregular grids |16| ch.4]. The best known low-discrepancy se- 
quences for quasi-Monte Carlo methods (van der Corput, Halton, Sobol, Faure, Hammersley,...) 
are not regular either [2]. In the compressed sensing domain, it has been shown that one can 
recover a signal very efficiently, and at a small cost, by using random measurements [3]. 

In this paper, we are focused on the role of spatial sampling for meta-modeling. Meta- 
modeling is particularly relevant for the analysis of complex computer models [20 . We will 
address the case of Kriging models, which consist in interpolating the values of a Gaussian 
random field given observations at a finite set of observation points. Kriging has become a 
popular method for a large range of applications, such as numerical code approximation |19U20j 
and calibration |15j or global optimization 

One of the main issues regarding Kriging is the choice of the covariance function for the 
Gaussian process. Indeed, a Kriging model yields an unbiased predictor with minimal variance 
and a correct predictive variance only if the correct covariance function is used. The most 
common practice is to statistically estimate the covariance function, from a set of observations 
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of the Gaussian process, and to plug [24j. ch.6.8] the estimate in the Kriging equations. Usually, it 
is assumed that the covariance function belongs to a given parametric family (see [T] for a review 
of classical families). In this case, the estimation boils down to estimating the corresponding 
parameters, that are called "hyper-parameters". 

The spatial sampling, and particularly its degree of regularity, play an important role for the 
covariance function estimation. In his monograph, Stein shows that a highly irregular sampling, 
with pairs of very close observation points, is preferable over a regular grid for the estimation 
of the smoothness parameter of the Matern model [241 ch.6.9]. It is shown in [32] that the 
optimal samplings, for maximizing the log of the determinant of the Fisher information matrix, 
are extremely irregular. Therefore, a generally admitted conjecture is that irregular sampling is 
always better, for hyper-parameter estimation in Kriging, than regular sampling. 

In this paper, we address this conjecture in an asymptotic framework. Since exact finite- 
sample results are generally not reachable and not meaningful as they are specific to the situation, 
asymptotic theory is widely used to give approximations of the estimated hyper-parameter 
distribution. 

The two most studied asymptotic frameworks are the increasing-domain and fixed-domain 
asymptotics \24\ p. 62]. In increasing-domain asymptotics, a minimal spacing exists between two 
different observation points, so that the infinite sequence of observation points is unbounded. 
In fixed-domain asymptotics, the sequence is dense in a bounded domain. 

In fixed-domain asymptotics, significant results are available, concerning the estimation of the 
covariance function, and its influence on Kriging predictions. In this asymptotic framework, two 
types of covariance hyper-parameters can be distinguished: microergodic and non-microergodic 
hyper-parameters. Following the definition in |24| . an hyper-parameter is microergodic if two 
covariance functions are orthogonal whenever they differ for it (as in |24| , we say that two 
covariance functions are orthogonal if the two underlying Gaussian measures are orthogonal). 
Non-microergodic hyper-parameters cannot be consistently estimated, but have no asymptotic 
influence on Kriging predictions |21( 122] 123], I3U] . On the contrary, there is a fair amount of 
literature on consistently estimating microergodic hyper-parameters using the Maximum Like- 
lihood (ML) method. Consistency has been proved for several models [251 1251 ITT1 1301 [T01 [2J. 
Micro-ergodic hyper-parameters have an asymptotic influence on predictions, as shown in [271 
ch.5]. 

Nevertheless, the fixed-domain asymptotic framework is not well adapted to study the in- 
fluence of the irregularity of the spatial sampling on hyper-parameter estimation. Indeed, we 
would like to compare sampling techniques by inspection of the asymptotic distributions of the 
hyper-parameter estimators. In fixed-domain asymptotics, when an asymptotic distribution is 
proved for ML (25J |2_n| , it turns out that it is independent of the dense sequence of observation 
points. This makes it impossible to compare the effect of spatial sampling on hyper-parameter 
estimation using fixed-domain asymptotics techniques. 

The first characteristic of increasing-domain asymptotics is that, as shown in section[5j all the 
hyper-parameters have strong asymptotic influences on predictions. The second characteristic 
is that all the hyper-parameters (satisfying a very general identifiability assumption) can be 
consistently estimated, and that asymptotic normality generally holds |26l IT21 Roughly 
speaking, increasing-domain asymptotics is characterized by a vanishing dependence between 
distant observation points. As a result, a large sample size gives more and more information 
about the covariance structure. Finally, we show that the asymptotic variances of the hyper- 
parameter estimators strongly depend on the spatial sampling. This is why we address the 
increasing-domain asymptotic framework to study the influence of the spatial sampling on the 
hyper-parameter estimation. 

We propose a sequence of random spatial samplings of size n £ N* . The regularity of the 
spatial sampling sequence is characterized by a regularity parameter eg (— ^, h). e = corre- 
sponds to a regular grid, and the irregularity is increasing with e. We study the ML estimator, 
and also a Cross Validation (CV) estimator [251 I3T] , for which, to the best of our knowledge, no 
asymptotic results are yet available in the literature. For both estimators, we prove an asymp- 
totic normality result for the estimation, with a y/n convergence, and an asymptotic covariance 
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matrix which is a deterministic function of e. The asymptotic normality yields, classically, ap- 
proximate confidence intervals for finite-sample estimation. Then, carrying out an exhaustive 
analysis of the asymptotic variance, for the one-dimensional Matern model, we show that ir- 
regularity is indeed an advantage for estimation in the majority of the cases. However, we can 
determine the cases in which a regular grid performs better for estimation than irregular ones. 
This definitively gives a negative answer to the claim that irregular sampling is always better 
for hyper-parameter estimation than regular sampling. 

The rest of the article is organized as follows. In section [2] we introduce the random sequence 
of observation points, that is parameterized by the regularity parameter e. We also present the 
ML and CV estimators. In section [3j we give the asymptotic normality results. In section [4] 
we carry out and exhaustive study of the asymptotic variance. In section Appendix A[ we 
prove the results of section [3j in section | Appendix B[ we prove the results of section [4] and in 
section [Appendix C| we state and prove several technical results. Finally, section [Appendix D| 
is dedicated to the one-dimensional case, with e = 0. We present an efficient calculation of the 
asymptotic variances for ML and CV and of the second derivative of the asymptotic variance of 
ML, at e = 0, using properties of Toeplitz matrix sequences. 



2. Context 

2.1. Presentation and notation for the spatial sampling sequence 

We consider a stationary Gaussian process Y on M. d . We denote = [6i n f, sup ] p . The 
covariance function of Y is Kg with 9i„f < (9o) i < sup , for 1 < i < p. Kg belongs to a 
parametric model {Kg, 9 £ 0}, with Kg a stationary covariance function. 

We shall assume the following condition for the parametric model, which is satisfied in all 
the most classical cases, and especially for the Matern model that we will analyze in detail in 
section |4l 

Condition 2.1. • For all 9 £ 0, the covariance function Kg is stationary. 

• The covariance function Kg is three times differ entiable with respect to 9. For all q £ 
{0, 3}, ii, i q £ {1, ...,p}, there exists Ci lt ... t i < +oo so that for all 9 £ 0, t £ K d , 

where \t\ is the Euclidian norm oft. We define the Fourier transform of a function h : M. d — > 
K by h(f) = J K d h(t)e~ 1 f' t dt, where i 2 = — 1. Then, for all 9 £ Q, the covariance function 
Kg has a Fourier transform Kg that is continuous and bounded. 

• For all 9 £ 0, Kg satisfies 

K e {t) = f Koifyftdf. 

• (9, f) — > Kg (/) is continuous and positive on x M. d . 

We denote by (^i) igN , a sequence of deterministic points in N d so that for all N £ N* , 
{vi, 1 < i < N d } = {1, N} d . Y is observed at the points v t + eX h 1 < i < n, n £ N* , with 
— h < e < i and X{ Cx- Cx is a symmetric probability law with support Sx C [— 1, 
and with a positive probability density on Sx- Two remarks can be made on this sequence of 
observation points: 

• This is an increasing-domain asymptotic context. The condition — | < e < ^ ensures a 
minimal spacing between two distinct observation points. 
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Figure 1: Examples of three perturbed grids. The dimension is d = 2 and the number of observation points is 
n = 8 2 . From left to right, the values of the regularity parameter are 0, g and |. e = corresponds to a regular 
observation grid, while, when [e| is close to i the observation set is highly irregular. 



• The observation sequence we study is random, and the parameter e is a regularity param- 
eter, e = corresponds to a regular observation grid, while, when |e| is close to i, the 
observation set is highly irregular. Examples of observation sets are given in figure [T] with 
d = 2, n = 8 2 , and different values of e. 

We denote Cs x — {t\ — t2, t\ G Sx, ti € Sx}, the set of all possible differences between two 
points in Sx- We denote, for n G N*, X = (Xi, X n ), where we do not write explicitly 
the dependence in n for clarity. X is a random variable with law C x n . We also denote x = 
(xi,...,x n ), an element of (SxT 

realization of X. 

We define the n x n random matrix Rg by (Re) i j — Kg («, — vj + e (X, — X,-)). We do 
not write explicitly the dependence of Rg with respect to X, e and n. We shall denote, as a 
simplification, R := Rg . We define the random vector y of size n by yi — Y (vi + eXi). We do 
not write explicitly the dependence of y with respect to X, e and n. 

We denote as in [8], for a real nxn matrix A, \A\ 2 = ~Y^ij=i-A$,j anc ^ 1 1^1 1 * ne largest 
singular value of A. \.\ and ||.|| are norms and |.| is a matrix norm. We denote by <pi{M), 
1 < i < n, the eigenvalues of a symmetric matrix M. We denote, for two sequences of square 
matrices A and B, depending on n £ N* , A ~ B if \A — B\ -^ n ^+oo and \\A\\ and ||B|| are 
bounded with respect to n. Finally, for a square matrix A, we denote by diag (A) the matrix 
obtained by setting to all non diagonal elements of A. 

Finally, for a sequence of real random variables z n , we denote z n — > p and z n = o p (1) when 
z n converges to zero in probability. 

2.2. ML and CV estimators 

We denote Lg := ^ {log (det (Re)) + y t Rg 1 y} the modified opposite log-likelihood, where 
we do not write explicitly the dependence in X, Y, n and e. We denote by 9ml the Maximum 
Likelihood estimator, defined by 

9ml G argminLg, (2) 
flee 

where we do not write explicitly the dependence of 9ml with respect to X, Y, e and n. 

Remark. The ML estimator in Q is actually not entirely defined, since the likelihood function 
of Q can have more than one global minimizer. Nevertheless, the convergence results of 9ml , 
asn-> +oo, hold when 9ml is any random variable belonging to the set of the global minimizers 
of the likelihood of Q, regardless of the value chosen in this set. Furthermore, it can be shown 



that, with probability converging to one, as n — > oo (see the proof of Proposition Appendix C.10 



in Appendix Appendix C ) , the likelihood function has a unique global minimum. To define a 
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measurable function 9 ml of Y and X, belonging to the set of the minimizers of the likelihood, 
one possibility is the following. For a given realization of Y and X, let JC be the set of the 
minimizers of the likelihood. Let /Co = JC and, for < k < p — 1, JC^+i is the subset of JCk 
whose elements have their fc+ 1th coordinates equal to min |#fc+i, 9 G /Cfc|. Since, JC is compact 
(because the likelihood function is continuous with respect to 9 and defined on the compact set 
O), the set JC p is composed of a unique element, that we define as #mz,j which is a measurable 
function of X and Y. The same remark can be made for the Cross Validation estimator of 

When the increasing-domain asymptotics sequence of observation points is deterministic, it 
is shown in |12j that 9ml converges to a centered Gaussian random vector. The asymptotic 
covariance matrix is the inverse of the Fisher information matrix. For fixed n, the Fisher 
information matrix is the p x p matrix with (i,j)th element equal to |Tr ^R^ 1 d ^g° R^ 1 ^ fl ° 

Since the literature has not addressed yet the asymptotic distribution of 9ml m increasing- 
domain asymptotics with random observation points, we give complete proofs about it in section 
|Appendix A| Our techniques are original and not specifically oriented towards ML contrary to 
[2l)l IS] , so that they allow us to address the asymptotic distribution of the CV estimator in 
the same fashion. 

The CV estimator is defined by 

n 

9 C v G argminVVy; - y ifi (y-i)} 2 , (3) 

where, for 1 < i < n, y ifi := E S \ X {Vi\Vi, — , Vi-i, Vi+i, —,y n ) is the Kriging Leave-One-Out 
prediction of yi with covariance hyper-parameters 9. Egi^ denotes the expectation with respect 
to the distribution of Y with the covariance function Kg, given X. 

The CV estimator selects the hyper-parameters according to the criterion of the point-wise 
prediction errors. This criterion does not involve the Kriging predictive variances. Hence, the 
CV estimator of (J3j) cannot estimate an hyper-parameter impacting only on the variance of 
the Gaussian process. Nevertheless, all the classical parametric models {Kg, 9 G 0} satisfy the 
decomposition 9 = (a 2 ,9\ and {Kg, 9 € 0} = {a 2 Kg, a 2 > 0,9 £ ©}, with Kg a correlation 
function. Hence, in this case, 9 would be estimated by |3|, and a 2 would be estimated by the 
equation a 2 cv {9) = ~Yh=i ~ l*" V ~' — > wnere cf_ ii§ : = var e\x {Vi\Vu •■■> Vi-U Vi+i, Vn) is 

the Kriging Leave-One-Out predictive variance for yi with hyper-parameters a 2 — 1 and 9. 
var e\x denotes the variance with respect to the distribution of Y with the covariance function 
Kg, 9 = (1,9), given X. To summarize, the general CV procedure we study is a two-step 
procedure. In a first step, the correlation hyper-parameters are selected according to a mean 
square error criterion. In a second step, the global variance hyper-parameter is selected, so that 
the predictive variances are adapted to the Leave-One-Out prediction errors. Here, we address 
the first step, so we focus on the CV estimator defined in |3]). 

The criterion (|3| can be computed with a single matrix inversion, by means of virtual LOO 
formulas (see e.g [181 ch.5.2] for the zero-mean case addressed here, and [7] for the universal 
Kriging case) . These virtual LOO formulas yield 

n 

^2 {y% - m,e (y-i)} 2 = y t R^ 1 diag (Rg 1 ) " R g 1 y, 

i=l 

which is useful both in practice (to compute quickly the LOO errors) and in the proofs on CV. 
We then define 

CVg := ^Rg'diag {Rg 1 )' 2 R^y 

as the CV criterion, where we do not write explicitly the dependence in X, n, Y and e. Hence 
we have, equivalently to (|3j), 9cv & ar S mm ego 

CVg. 
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Since the asymptotic covariance matrix of the ML estimator is the inverse of the Fisher 
information matrix, this estimator should be used when 8 G O holds, which is the case of 
interest here. However, in practice, it is likely that the true covariance function of the Gaussian 
process does not belong to the parametric family used for the estimation. In [3], this case is 
called model misspecification case, and it is shown that CV is more efficient that ML in this 
case. Hence, the CV estimator is relevant in practice, which is a reason for studying it in the 
well-specified case 8q G 6 addressed here. Hence we aim at studying the influence of the spatial 
sampling on CV as well as on ML. Furthermore, since it is expected that ML performs better 
than CV in the well-specified case, we are interested in quantifying this fact. 



3. Consistency and asymptotic normality 



Proposition 3.1 addresses the consistency of the ML estimator. The only assumption on 
the parametric family of covariance functions is an identifiability assumption. Basically, for a 
fixed e, there should not exist two distinct hyper-parameters so that the two associated covari- 
ance functions are the same, on the set of inter-point distances covered by the random spatial 
sampling. The identifiability assumption is clearly minimal. 



Proposition 3.1. Assume that condition \2.1\ is satisfied. 

For e = 0, if there does not exist 8 ^ 8o so that Kg (v) — Kg (v) for all v G 1 d , then the ML 
estimator is consistent. 

For e ^ 0, we denote D e = U v£Z , d \o ( v + e( ^s x ), with Cs x = {ti — t 2 , ti G Sx, t 2 G Sx}- 
Then, if there does not exist 8 ^ 8 a so that Kg = Kg Q a.s. on D e , according to the Lebesgue 
measure on D e , and Kg (0) = Kg (0), then the ML estimator is consistent. 



In proposition |3.2| we address the asymptotic normality of ML. The convergence rate is 
y/n, as in a classical iid framework, and we prove the existence of a deterministic asymptotic 
covariance matrix of ^/tlSml which depends only on the regularity parameter e. In proposition 
|3.3[ we prove that this asymptotic covariance matrix is positive, as long as the different derivative 
functions with respect to 8 at 8q of the covariance function are non redundant on the set of inter- 
point distances covered by the random spatial sampling. This condition is minimal, since when 
these derivatives are redundant, the Fisher information matrix is singular for all finite sample- 
size n and its kernel is independent of n. 



Proposition 3.2. Assume that condition \2.1\ is satisfied. 

For all 1 < i,j < p, the random trace ^-Tr ^R _1 ff^R -1 converges a.s. to the element 
(Sml); of a p x p deterministic matrix £ml as n — > +oo. 

If 8ml is consistent and ifT,ML is positive, then 

ail - ) -> N (0, 2E^ £ ) . 



Proposition 3.3. Assume that condition \2.1\ is satisfied. 

For e = 0, if there does not exist v\ — (Ai,...,A p ) G W, v\ different from zero, so that 
Xk]^j-Kg (v) = for all v G 7L d , then Y<ml is positive. 

For e 7^ 0, we denote D c = \J v( z Ij d\ Q (v + eCs x ), with Cs x = {t\ — t 2 ,ti G Sx-t2 £ Sx}- If 
there does not exist v\ = (Ai, A p ) G R p , v\ different from zero, so that t — > 53fe=i ^k^~Ke (t) 
is almost surely zero on D e , with respect to the Lebesgue measure on D e , and that X)fe=i ^k]^Kg (0) 
is null, then Timl is positive. 



Proposition |3 . 4| addresses the consistency of the CV estimator. The identifiability assumption 
is required, like in the ML case. Since the CV estimator is designed for estimating correlation 
hyper-parameters, we assume that the parametric model {Kg, 8 G 8} contains only correlation 
functions. This assumption holds in most classical cases, and yields results that are easy to 
express and interpret. The case of hybrid hyper-parameters, specifying both a variance and a 
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correlation structure should be consistently estimated by the CV estimator. Nevertheless, since 
such hyper-parameters do not exist in the classical families of covariance function, we do not 
address this case. 



Proposition 3.4. Assume that condition 2.1 is satisfied and that for all 9 € S, Kg (0) = 1. 

For e = 0, if there does not exist 9 ^ 9q so that Kg (v) = Kg (v) for all v E Z d , then the CV 
estimator is consistent. 

For e ^ 0, roe denote D e = U veZl d\ a (v + eCs x ), with Cs x = {t\ — t2,h € Sx^2 € Sx}- 
Then, if their does not exist 9 9q so that Kg = Kg a.s. on D t , with respect to the Lebesgue 
measure on D e , the CV estimator is consistent. 



Proposition |3.5| gives the expression of the covariance matrix of the gradient of the CV 
criterion CVg and of the mean matrix of its Hessian. These moments are classically used in 
statistics to prove asymptotic distributions of consistent estimators. We also prove the conver- 
gence of these moments, the limit matrices being functions of the p x p matrices Scv.i an d 
TiCV,2, for which we prove the existence. These matrices are deterministic and depend only on 
the regularity parameter e. 



Proposition 3.5. Assume that condition \2.1\ is satisfied. 
With, for 1 < i < p, 



Mi=R^diag{Rg l ) 2 Idiag (r-^R- 1 ^ diag (Rg 1 ) '-R^^RJ 1 , 



we have, for all 1 < i,j < p, 



and 



^-CVg = hytMfa, 



'' ,l ^w i cv ^w j cv ^ x )= 2 i 1 " 



{l< + (Mgj'} R 0a + (My'} Rg 



(4) 



Furthermore, the random trace in Q converges a.s. to the element (Scv.Ojj of a p x p deter- 
ministic matrix T*cv,i as n —> +oo. 
We also have 



'(im^^) - - 8 ^{^( R »')"^< R ». li ^ R ». 1 ) R »^ R »' 

+ 2lTv{di, g (R-)-R 8 -.^R 8 -.^R- 
+6ilv{di,g(R„- i ')- i di,g (r«,'^R* 1 ) di«I (V^V) V 

Furthermore, the random trace in ^ converges a.s. to the element i^cv,i) i j of a p x p 
deterministic matrix Scv,2 as n ^ +oo. 



In proposition |3.6| we address the asymptotic normality of CV. The conditions are, as for 
the consistency, identifiability and that the set of covariance functions contains only correlation 
functions. The convergence rate is also y/n, and we have the expression of the deterministic 
asymptotic covariance matrix of y/n9cvj depending only of the matrices Hcv,i an d 2cv,2 of 



proposition 3.5 In proposition 3.7 we prove that the asymptotic matrix £cv,2 is positive. The 
minimal assumption is, as for the ML case, that the different derivative functions with respect to 
9 at 9q of the covariance function are non redundant on the set of inter-point distances covered 
by the random spatial sampling. 
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Proposition 3.6. Assume that condition \2.1\ is satisfied. 
If 9(jv is consistent and if T,cv,2 is positive, then 



(§cv - 9a) -> N (o, S C y 2 Y, C v,i^cv,2) 



Proposition 3.7. Assume that condition 2.1 is satisfied and that for all 9 £ O, Kg (0) = 1. 

For e — 0, if there does not exist v\ — (Ai,...,A p ) G W, v\ different from zero, so that 
Sfc=i ^fcgf^^o ( v ) = f or a ^ v G ^ j then Y*cv,2 is positive. 

For e ^ 0, we denote D t — ^ ve z d \a ( v + £ C-s x )> w ^h Cs x = {ti — t 2 ,ti G Sx,t2 G Sx}- If 
there does not exist v\ = (Aj, A p ) G K p , v\ different from zero, so that t — > X^=i ^fe af^^o (0 
is almost surely zero on D e , with respect to the Lebesgue measure on D e , then £cv,2 * s positive. 

The conclusion for ML and CV is that, for all the most classical parametric families of 
covariance functions, consistency and asymptotic normality hold, with deterministic positive 
asymptotic covariance matrices depending only on the regularity parameter e. Therefore, these 
covariance matrices are analyzed in section ^] to address the influence of the irregularity of the 
spatial sampling on the ML and CV estimation. 



4. Analysis of the asymptotic variances 



The limit distributions of the ML and CV estimators only depend on the regularity parameter 
e through the asymptotic covariance matrices in propositions |3.2| and |3.6| The aim of this section 
is to numerically study the influence of e on these asymptotic covariance matrices. 

The asymptotic covariance matrices of propositions |3.2| and |3.6| are expressed as functions of 
a.s. limits of traces of sums, products and inverses of random matrices. In the case e = 0, for 
d = 1, these matrices are deterministic Toeplitz matrices, so that the limits can be expressed 
using Fourier transform techniques (see [5]). 
expression of Xml> 



In section Appendix 
; and d 



D 



we give the closed form 
In the case e ^ 0, there does not 



Scv,i and Yicv.2 for e = and d = 1 
exist, to the best of our knowledge, any random matrix technique that would give a closed form 
expression of Sa/l, ^cv,i and Y*cv,2- Therefore, for the numerical study with e ^ 0, these 
matrices will be approximated by the random traces for large n. 

4-.1. The derivatives of Eml, Scv,i cmd T,qv,2 

In proposition |4 . 2 1 we show that, under the mild condition |4.1| the asymptotic covariance ma- 
trices obtained from Sml, Scv.i and £cv,2, of propositions 3.2 and 3.6 are twice differentiable 



with respect to e. This result is useful for the numerical study of the next subsections. 
Condition 4.1. • Condition \2.1\ is satisfied. 

• Kg{t) and ^rK$ (t), for 1 < i < p, are three times differentiable in t for t =/= 0. 

,tk € {!,•■■, d} k , there exists Ct < +oo 



• For allT > 0, 9 € 8, 1 < i < p, k G {1,2,3}, t u 
so that for \t\ > T , 



8_ d_ 
dh ' '"' dt k 
d d d 



K e (t) 
K e {t) 



< 



< 



l + \t\ d + 1 ' 



(6) 



Proposition 4.2. Assume that condition 4-1 is satisfied. 



Let us fix 1 < i,j < p. 



The elements (Eml) 



1,3- 



(£cv,i) 



i,3 



and (Scv^)^ • (as defined in 



propositions 3.2 and 3.5) are C in e on [0, |). Furthermore, with -E {Tr (Mml)} -> (Sml) 



<s 



iE{Tr(M cv ,i)} -> (S C y,i), J and ^E {Tr (M C v, 2 )} -> (Scrv,a) w (propositions gjj and gg), 
we /ia«e, /or (E)^ ftein^ (E AfL ) i j; (Ecv.i)^ or (^CV^)^- and M feeing M ML , M C v,i or M^v a 

d 2 1 ( d 2 

— (E) = lim -E — Tr(M) 

de 2 l ' 3 m+oon de 2 



Proposition 4.2 shows that we can compute numerically the derivatives of E^l, Eery,-, 
j = 1, 2, with respect to e by computing the derivatives of Mml, Mcv.j, j — L 2, for n large. 
The fact that it is possible to exchange the limit in n and the derivative in e was not a priori 
obvious. 

In the rest of the section, we address specifically the case where p = 1. d = 1, and the law 
of the Xi, 1 < i < n, is uniform on [—1,1]. We focus on the case of the Matern covariance 
function. In dimension one, this covariance model is parameterized by the correlation length £ 
and the smoothness parameter v. The covariance function K^ v is Matern (£, v) where 

with r the Gamma function and K v the modified Bessel function of second order. See e.g [241 
p. 31] for a presentation of the Matern correlation function. 



4-2. Small random perturbations of the regular grid 

In our study, the two true hyper-parameters (£q, vq) vary over 0.3 < £q < 3 and 0.5 < vq < 5. 
We will successively address the two cases where I is estimated and v is known, and where 
v is estimated and £ is known. It is shown in section |4.1| that for both ML and CV, the 
asymptotic variances are regular functions of e. Of course, they are even functions of e, so 
that the quantity of interest is the ratio of the second derivative with respect to e at e = of 
the asymptotic variance over its value at e = 0. When this quantity is negative, this means 
that the asymptotic variance of the hyper-parameter estimator decays with e, and therefore 
that an irregular sampling is more favorable for hyper-parameter estimation than a regular one. 
The second derivative is calculated exactly for ML, using the results of section | Appendix D| 
and is approximated by finite differences for n large for CV. Proposition |4.2| ensures that this 
approximation is numerically consistent (because the limits in n and the derivatives in e are 
exchangeable) . 

On figure [2] we show the numerical results for the estimation of I. First we see that the 
relative improvement of the estimation due to irregularity is maximum when the true correla- 
tion length £ is small. Indeed, the inter-observation distance being 1, a correlation length of 
approximatively 0.3 means that the observations are almost independent, making the estimation 
of the covariance very hard. Hence, the irregularity of the grid creates pairs of observations that 
are less independent and makes the estimation possible. For large £q, this phenomenon does 
not take place anymore, and thus the relative effect of the irregularity is smaller. Second, we 
observe that for ML the irregularity is always an advantage for estimation. This is not the case 
for CV, where the asymptotic variance can increase with e. Finally, we can see that the two 
particular points (£q — 0.5, vq — 5) and (£q — 2.7, Vq = 1) are particularly interesting and rep- 
resentative, since (£q = 0.5, i>q — 5) corresponds to hyper-parameters for which the irregularity 
of the sampling has a strong and favorable impact on the estimation for ML and CV, while 
(£q — 2.7, i>q = 1) corresponds to hyper-parameters for which the irregularity of the sampling 
has an unfavorable impact on the estimation for CV. We retain these two points for further 
investigation for < e < 0.45 in subsection |4.3| 

On figure [3] we show the numerical results for the estimation of v. We observe that for £q 
relatively small, the asymptotic variance is an increasing function of e (for small e). This happens 
approximatively in the band 0.4 < £ < 0.6, and for both ML and CV. This fact is not easy to 
interpret but it definitely gives a negative answer to the claim that irregular sampling is always 
better for hyper-parameter estimation than regular sampling. For 0.6 < £q < 0.8 and i/q > 2, 
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Figure 2: Local influence of e for the estimation of the correlation length I. Plot of the ratio of the second 
derivative of the asymptotic variance over its value at e = 0, for ML (left) and CV (right). The true covariance 
function is Matern with varying Iq and vq. The advantage of perturbing the regular grid is maximum when the 
correlation length £q small, i.e. when the observations are almost independent. The asymptotic variance always 
locally decreases with e for ML (i.e. the second derivative at e = is always negative) but not for CV. We re tain 
the two particular points (£q = 0.5, uq = 5) and (£q = 2.7, uq = 1) for further investigation in subsection |4.3| 
(these are the black dots). 



the relative improvement is maximum. This improvement remains significant, though smaller, 
for larger £q. Finally, we see the three particular points (£q — 0.5, vq — 2.5), (£q — 0.7, vo = 2.5) 
and (4) = 2.7, vq — 2.5) as representative of the discussion above, and we retain them for further 



investigation for < e < 0.45 in subsection 4.3 



4-.S. Large random perturbations of the regular grid 

In this subsection, we plot the asymptotic variances of propositions |3.2| and |3.6l as functions 
of e for —0.45 < e < 0.45. The asymptotic variances are even functions of e. Nevertheless, they 
are approximated by empirical means of iid realizations of the random traces in propositions |3.2 
and |3.5[ for n large enough. Hence, the functions we plot are not exactly even. The fact that 
they are almost even is a graphical verification that the random fluctuations of the results of the 
calculations, for finite (but large) n, are very small. We also plot the second-order Taylor-series 
expansion given by the value at e = and the second derivative at e = 0. 

On figure [I] we show the numerical results for the estimation of £ with (£q = 0.5, v§ = 5). 
The first observation is that the asymptotic variance is slightly larger for CV than for ML. This 
is expected: indeed we address a well-specified case, so that the asymptotic variance of ML is 
the almost sure limit of the Cramer-Rao bound (the true covariance function belongs to the 
parametric family of covariance functions, see [3])- Therefore, this observation turns out to be 
true in all the subsection, and we will not comment on it anymore. We see that, for both ML 
and CV, the improvement of the estimation given by the irregularity of the spatial sampling is 
true for all values of e. One can indeed gain up to a factor six for the asymptotic variances. This 



is explained by the reason mentioned in subsection 4.2 for £q small, increasing e yields pairs of 
observations that become dependent, and hence give information on the covariance structure. 

On figure[5j we show the numerical results for the estimation of £ with (£q — 2.7, vq = 1). For 
ML, there is a slight improvement of the estimation with the irregularity of the spatial sampling. 
However, for CV, there is a significant degradation of the estimation. Hence the irregularity of 
the spatial sampling has more relative influence on CV than on ML. Finally, the advantage of 
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Figure 3: Same setting as figure[2] but for the estimation of v. For approximatively 0.4 < £q < 0.6, the estimation 
is damaged by locally perturbing the regular grid. For 0.6 < £o < 0.8, the improvement of the estimation 
is maximum, and remains positive for larger £q. We retain the three particular points (£q = 0.5,^0 = 2.5), 
(£o = 0.7, vq = 2.5) and (£q = 2.7, uo = 2.5) for further investigation in subsection|4.3| 
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Figure 4: Global influence of e for the estimation of the correlation length £. Plot of the asymptotic variance for 
ML (left) and CV (right), calculated with varying n, and of the second order Taylor series expansion given by 
the value at e = and the second derivative at e = 0. The true covariance function is Matern with £q = 0.5 and 
vq = 5. The irregularity of the spatial sampling globally improves the estimation for both ML and CV. 
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Figure 5: Same setting as in figure [4] but with £q = 2.7 and uq = 1. The irregularity of the spatial sampling 
slightly improves ML estimation but degrades CV estimation. 

ML over CV for the estimation is by a factor seven, contrary to the case £q — 0.5, where this 
factor was close to one. 

On figure [6j we show the numerical results for the estimation of v with (lg — 0.5, vq = 2.5). 
The numerical results are similar for ML and CV. For e small, the asymptotic variance is very 
large, because, £$ being small, the observations are almost independent, as the observation points 
are further apart than the correlation length, making inference on the dependence structure very 
difficult. We see that, for e = 0, the asymptotic variance is several orders of magnitude larger 
than for the estimation of £ in figure[4] where £q has the same value. Indeed, in the Matern model, 
v is a smoothness parameter, and its estimation is very sensitive to the absence of observation 
points with small spacing. Hence, naturally, for e large, a threshold is reached where pairs of 
dependent observations start to appear, greatly reducing the asymptotic variance. However, we 
observe, as discussed in figure [3] that for |e| < 0.2, the asymptotic variance increases with e. 
This non-monotony of the asymptotic variance with respect to e is again a situation in which 
irregular sampling gives a reduced hyper-parameter estimation performance compared to regular 
sampling. 

On figure [7j we show the numerical results for the estimation of v with (£ = 0.7, i>q = 2.5). 
The numerical results are similar for ML and CV. Similarly to figure[6] the asymptotic variance is 
very large, because the observations are almost independent. However, this time the asymptotic 
variance is globally decreasing with e. This variance is several orders of magnitude smaller for 
large e, where pairs of dependent observations start to appear. 

On figure [8j we show the numerical results for the estimation of v with (£ a = 2.7, = 2.5). 
For both ML and CV, there is a global improvement of the estimation with the irregularity of 
the spatial sampling. Moreover, the advantage of ML over CV for the estimation, is by a factor 
seven, contrary to figures [6] and [7j where this factor was close to one. 

4-4- Discussion 

The first conclusion is that a substantial irregularity of the spatial sampling is generally an 
advantage for hyper-parameter estimation. Indeed, we have seen that, for ML, the asymptotic 
variance is always smaller for |e| > 0.2 than for e = 0. This is also true for CV in the case 
of the estimation of v. However, for the estimation of £, we can identify the cases where the 
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Figure 6: Same setting as in figure [4] but for the estimation of v and with Iq = 0.5 and uo = 2.5. Results are 
similar for ML and CV. When e = 0, the estimation is difficult because the observations are almost independent. 
The estimation is easier for e large, where pairs of dependent observations start to appear. For e small, the 
asymptotic variance increases with e. 




Figure 7: Same setting as in figure [4] but for the estimation of v and with Iq = 0.7 and uq = 2.5. Results are 
similar for ML and CV. When e = 0, the estimation is difficult because the observations are almost independent. 
The estimation is easier for € large, where pairs of dependent observations start to appear. 
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Figure 8: Same setting as in figure [4] but for the estimation of v and with £q = 2.7 and uq = 2.5. For both ML 
and CV, there is a global improvement of the estimation with the irregularity of the spatial sampling. ML has a 
substantial advantage over CV for the estimation. 



asymptotic variance is globally increasing with e. This can be due to the fact that the Leave-One- 
Out errors in the CV functional are unnormalized. Hence, with e ^ 0, roughly speaking, error 
terms concerning observation points with close neighbors are small, while error terms concerning 
observation points without close neighbors are large. Hence, the CV functional mainly depends 
on the large error terms and hence has a larger variance. 

Nevertheless, the second conclusion is that locally perturbing a regular observation grid 
can damage the estimation for both ML (figure [3]) and CV (figures [2] and |3| . This is the most 
important practical observation that follows for our detailed analysis of the asymptotic variances 
of the hyper-parameter estimators. 



5. Influence of hyper-parameter misspecification on prediction 



In proposition |5.1| we show that the misspecification of correlation hyper-parameters has an 
asymptotic influence on the prediction errors. Indeed, the difference of the asymptotic Leave- 
One-Out mean square errors, between incorrect and correct hyper-parameters, is lower and 
upper bounded by finite positive constants times the integrated square difference between the 
two correlation functions. 



Proposition 5.1. Assume that condition \2.1\ is satisfied and that for all 9 € 0, Kg(fS) = 1. 

Let, for 1 < i < n, y i>e (y-i) := Ee\x (yi\yi, —,Vi-l,Vi+i, -,Vn) be the Kriging Leave-One- 
Out prediction of yi with covariance hyper-parameters 9. We then denote 



D„(0,e n 



E 



E 



{yi - m.e{y-i)} 



E 



1 " 

-y2{yi-yi,e (y-i)Y 

n * — ■» 



i=l 



Then there exists constant < A < B < +oo so that, for e = 



A {Ke(v) - Kg (v)} 2 < lim D p (9,9 ) 

n— 5-+oo 



v£Z d 
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and 

En" D p (6,6 n )<BY^{K e (v)-K eo {v)f . 
For e 7^ 0, we denote D e = U u6Zt i\ (w + eCs x ), wii/i C$ x = {t± —t 2 ,ti G Sx,t-2 £ S!x}- T7ien 

and 

Hm D p (Mo) <B / {tf (t) -K eo (t)} 2 dt 
j D ^ 

Proof. The minoration is proved in the proof of proposition |3.4| The majoration is obtained 
with similar techniques. □ 

6. Conclusion 

We have considered an increasing-domain asymptotic framework to address the influence of 
the irregularity of the spatial sampling on the estimation of the covariance hyper-parameters. 
This framework is based on a random sequence of observation points, for which the deviation 
from the regular grid is controlled by a single scalar regularity parameter e. 

We have proved consistency and asymptotic normality for the ML and CV estimators, under 
rather minimal conditions. The asymptotic variances are deterministic functions of the regularity 
parameter only. Hence, it is the natural tool to assess the influence of the irregularity of the 
spatial sampling on the ML and CV estimators. 

This is carried out by means of an exhaustive study of the Matern model. We put into 
evidence that the irregularity of the spatial sampling is generally an advantage for estimation. 
However, we show that there exist cases for ML when disrupting a regular spatial sampling can 
on the contrary damage estimation. In the CV case, estimation can be very strongly damaged 
for a strong random perturbation of the grid. 

Hence, the overall conclusion is that we definitely give a negative answer to the claim that 
irregular sampling is always better for hyper-parameter estimation than regular sampling. The 
influence of the regularity of the spatial sampling on the covariance hyper-parameter estimation 
remains a non trivial problem. 

The CV criterion we have studied is a mean square error criterion. This is a classical CV 
criterion that is used, for instance, in |20| when the CV and ML estimations of the hyper- 
parameters are compared. We have shown that this CV estimator can have a considerably 
larger asymptotic variance than the ML estimator, on the one hand, and can be sensitive to the 
irregularity of the spatial sampling, on the other hand. Although this estimator performs better 
than ML in cases of model misspecification [3] , further research may aim at studying alternative 
CV criteria that would have a better performance in the well-specified case. 

Other CV criteria are proposed in the literature, for instance the LOO log-predictive prob- 
ability in [17] and the Geisser's predictive mean square error in [25j. It would be interesting to 
study, in the framework of this paper, the increasing-domain asymptotics for these estimators 
and the influence of the irregularity of the spatial sampling. 

In section|4j we pointed out that, when the spatial sampling is irregular, the mean square er- 
ror CV criterion could be composed of LOO errors with heterogeneous variances, which increases 
the CV estimation variance. Methods to normalize the LOO errors would be an interesting re- 
search direction to explore. 
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Appendix A. Proofs for section [3] 



In the proofs, we distinguish three probability spaces. 

(flx,J~x, Px) is the probability space associated with the random perturbation of the regular 
grid. (Xj)jgtij* is a sequence of iid S^-valued random variables defined on (Qx^Xj Px)i with 
distribution Cx- We denote by ujx an element of Qx- 

(Cly , J-y , Py) is the probability space associated with the Gaussian process. Y is a centered 
Gaussian process with covariance function Kg defined on (Q,y , Fy , Py) ■ We denote by ujy an 
element of fly . 

(ft,J-,P) is the product space (fix x ^y,J~x ®Fy,Px x Py)- We denote by uj an element 

of n. 

All the random variables in the proofs can be defined relatively to the product space (f2, J 7 , P). 
Hence, all the probabilistic statements in the proofs hold with respect to this product space, 
unless it is stated otherwise. 

In the proofs, when (/„) n gN* is a sequence of real functions of X — (Aj)" =1 , /„ is also a 
sequence of real random variables on (Slxi-Px, Px)- When we write that /„ is bounded uniformly 
in n and x, we mean that there exists a finite constant K so that sup„ sup^gg™ |/„(a;)| < K. We 
then have that is bounded Px-'as., i.e sup n f n < K for a.e. ujx & &x- We may also write that 
/„ is lower-bounded uniformly in n and x when there exist a > so that inf n mf X £s n fn(x) > a. 
When /„ also depends on 9, we say that f n is bounded uniformly inn, i and 9 when supg g @ /„ 
is bounded uniformly in n and x. We also say that /„ is lower-bounded uniformly in n, x and 
9 when infg e e fn is lower-bounded uniformly in n and x. 

When we write that /„ converges to zero uniformly in x. we mean that sup^g^n |/„(a;)| — > n ^+oo 
0. One then have that f n converges to zero Py-a.s. When /„ also depends on 9, we say that /„ 
converges to zero uniformly in n, x and 9 when sup ege /„ converges to zero uniformly in n and 
x. 

When /„ is a sequence of real functions of X and Y, f n is also a sequence of real random 
variables on (ft, F 1 P). When we say that /„ is bounded in probability conditionally to X — x and 
uniformly in x, we mean that, for every e > 0, there exist M, N so that swp n>N sup^g 
M\X = x) < e. One then have that /„ is bounded in probability (defined on the product space). 



Appendix A.l. Proof of proposition 3.1 



Proof. We show that there exist sequences of random variables, defined on (flx,J-x, Px), De,e 
and D2fi,e a (functions of n and A), so that sup e \ (Lg — Lg ) — Dg t g \ —t p (in probability of the 
product space) and Dg^ > BD2,e,e Px-a.s. for a constant B > 0. We then show that there 
exists -Doo,e,e ! a deterministic function of 9,9q only, so that sup e |D2,e,e — ^oo,e,e | = o p (1) 
and for any a > 0, 

> 0. (A.l) 



inf Doogi 
\e-e \> a 



This implies consistency. 

We have Lg = i log{det ( Re)} + ^y t Rg 1 y . The eigenvalues of Rg and Rg 1 being bounded 
uniformly in n and x (lemma Appendix C.5), var (Lg\X = x) converges to uniformly in x, 

and so Lg — E (Lg\X) converges in probability P to zero. 

_ i 

Then, with z — R e 2 y, 



sup 

ke{i,...,p},0ee 



dig 



d0 k 



< 



1 

sup 

fce{i,...,p},eee n 



sup 

ke{i,...,p},e 





Tr R7 - 
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V " 96* J 




ORg 


(lIVII 


00k 



z*R| R7 n 
00 6 d9 k 

,\\Re \\\\Rg-i\ 













dRg 


II 


d9 k 



i + -N : 



and is hence bounded in probability conditionally to X = x, uniformly in x, because of lemma 
Appendix C.5 and the fact that z ~ J\f (0, I n ) given X = x. 



1G 



Because of the simple convergence and the boundedness of the derivatives, supg \Lg— E (Lg\X) \ — > p 
0. We then denote D e> g :=E(L e \X)-E(Lg \X). We then have sup 8 | (L e — Lg )-Dg t g \ -t p 0. 
We have E (Lg\X) = A log{det (Rg)} + ^Tr (R~ and hence, P x -a,.s. 



Dg.g = - log {det (Rg)} + -Tr (Rg 1 R e ) - - log {det (Rg )} - 1 

n n n 

i n ii ii 

= " E [- lo S { R l R o lR l) } + h ( R i R e lR l) 1 

i=l 

Using proposition | Appendix C.4| and lemma |Appendix C.5| there exists < a < b < +oo 
so that for all x, n, 9, a < 4>i (^R^Rg 1 Rfi \ < b. We denote / (t) = — log (t) + 1 — 1. As / is 
minimal in 1, /' (1) = and /" (1) = 1, there exists A > so that, for t E [a,b], / (t) is larger 
than A(t- l) 2 . Then, 

= A— Tr ifl — RJ_ R7 1 R| 



A- 



Rg 2 {Re — Re ) Rg 



Then, as the eigenvalues of R g 2 are larger than c > 0, uniformly in n, x and 9, and with 
\M N\ 2 > infj </>f (M) |iV| 2 for M symmetric positive, we obtain, for some B > 0, and uniformly 
in n, x and 

Dgg n > B\Rg — Rg n \ :— BD 2 .g_g n - 



For e = 0, T)ififi Q is deterministic and converges to 



(A.2) 



re: 



Dooj,8 est continuous in because the series of term sup# (u) | 2 , « 6 Z d is summable using 
and lemma Appendix C.l Hence, if there exists a > 0, inf \o-e \> a -Doo,0,0 o = 0, we can, 
using a compacity and continuity argument, have 9^ ^ #o so that (A.2 1 is null. Hence we 
showed ( A.l I by contradiction, which shows the proposition for e = 0. 



For e ^ 0, D%fifi a — ^Tr|(Rg — Rg ) 2 j. With fixed 9. using proposition 



Appendix C.7 



B>2fi,B Q converges in Px-probability to D COt g^g := lim„_>. 00 Ex (D2.e,e )- The eigenvalues of the 
^jf 4 , 1 < i < n, being bounded uniformly in n. 9. x, the partial derivatives with respect to 9 of 



Z?2.6»,e are uniformly bounded in n. 9 and x. Hence sup e |-D2.e,e — -^o 



o p (1). Then 



lim 



1 E 



{Kg ( Vi - Vj + t) - Kg Q ( Vi - Vj + t)f f T (t) dt 



-{Kg (0) - Kg (0)} 2 , 



with fx (t) the probability density function of e (X; — Xj), i ^ j. We then show, 



,e,e 



E 



{X 9 (v + t)- X 0n (« + i)} 2 /t (t) dt 



{X e (0) - X 9o (0)K 



{Jf 9 (t) - X 9o (i)} 2 / T (t) tft + {tf fl (0) - K 9o (0)} 2 



(A.3) 



-Dr 



As sup e \Kg (t) | 2 is summable on D t , using (JT|), D OOi g^g is continuous. Hence, if there exists 
a > so that inf|g_g |> a -Doo,0,0 o = 0j w e can, using a compacity and continuity argument, show 
that there exists 9^ ^ 9 so that (A.3 1 is null. Hence we proved (A.l) by contradiction which 
proves the proposition for e =/= 0. 

□ 
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Appendix A. 2. proof of Proposition^ 

Proof. For 1 < i,j < p, we use proposition 
has a Px-almost sure limit as n — > +00 



Appendix C.7 



to show that ±Tr (rt 1 f^R- 1 f|) 



We calculate = ± |Tr (R^ 1 ^ ) - y'Rg Rg V}- We use proposition 



C.9 



with Mj = JL and — ~R g 1 ^m L Rg S together with proposition Appendix C.7 



show that 



We calculate 



L e d0i u 9 



Appendix 
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d 2 L 
90,90,-° 
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X 9R X 9R 
" R 90^ 5ft 



R" 



9 2 R. 



9 2 i? 



96», 90^ 90*90,; 



IT 



Hence, using proposition Appendix C. 
(on the product space). 



<Lg Q converges to £ml in the mean square sense 



Finally, gg.gg.gg k Lg can be written as i {Tr (Mg) + z'N^z}, where Mjj and Ng are sums of 
matrices of Mg (proposition Appendix C.7[ ) and where z depends on X and Y and (z|A) = 
Af (0, /„). Hence, the singular values of Mg and Ng are bounded uniformly in 0, n and x, and so 
3 i j fc0 80 de de k Lg i s bounded by a + &^|z| 2 , with constant a,b < +00 and is hence bounded 



su Pi 



in probability. Hence we apply proposition Appendix C.10 to conclude 



□ 



Appendix A. 3. Proof of proposition |ffT 

Proof. We firstly prove the proposition in the case p = 1, when Sml is a scalar. We then show 
how to generalize the proposition to the case p > 1. 

For p = 1 we have seen that iTr (r" 1 ^^R^, 1 £3 §f a ) ^p x s ml- Then 



i Tr / R -i5Reo R -i9ReoA _ 1 / -|9Rgo -I -|9Rflo -j 
n V e ° 90 90 9fl Tn 90 e ° e ° 90 *° 



> 



inf & (i?j) 4 



9i?« 



90 



By lemma 



Appendix C.5 



similarly to the proof of proposition 
We now address the case p > 1. 



there exists a > so that infi irii:c 

OR, 



3.1 



> a. We then show, 



that the limit of 



ae 



is positive. 



Let v\ — Ai, A p G M p , «a different from zero. We define 



the model {Kg, 5 e [S in f,5 aup ]}, with S inf < < 6 sup by = K^ ) 1+ sx u ...,(B ) +s\p- Then 
#5=0 = #0 Q - We have J^K s = (t) = YX=i^W^ K s (*)) so the model {^,(5 € [<5,„/,<W]} 
verifies the hypotheses of the proposition for p — 1. Hence, the P-mean square limit of J^zLg—o 
is positive. We conclude with 



■ L S=Q = "A (^2^ n ) «A- 



□ 



Appendix A. 4- Proof of proposition \3.4\ 

Proof. We will show that there exists a sequence of random variables defined on (tlx, Fx j -Px") 
£>e,e so that sup e | (CVg — CVg ) — De,e \ —>p and C > so that P x -&.s. 



Dn 



>C\R e 



Re„\ 2 - 



(A.4) 
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The proof of the proposition is then carried out similarly to the proof of proposition |3.1 



We firstly show, similarly to the proof of proposition 3.1 that sup 9 \CVg — E (CVg\X) | — > p 0. 
We then denote D e ,e — E (CVg\X) — E (CVg \X). We decompose, for all i e {l,...,n}, with P t 
the matrix that exchanges lines 1 and i of a matrix, 



PiReP! 



1 < f 
n.e R-i 



The conditional laws being independent on the numbering of the observations, we have, using 
the Kriging equations 



Do 



1 n ( 2 1 

- E E { r leR--l e y-i - rl eQ R-_l eo y-i) \X 

i=l ' 
1 n 

in 



rUR~), 



rXo R_] t6o ) R-i,e [R_] fi ri,e - R_lg o n,e 



Similarly to lemma Appendix C.5| it can be shown that the eigenvalues of R-i,g are larger 
than a constant A > 0, uniformly in n and x. Then 



1 n 

*o > A -Y.\\{<e R -le-<eo R -le 
»=i 



Using the virtual Cross Validation equations |18l ch.5.2], the vector R_\gTi g is the vector of the 



for 1 < j < n, j ^ i. Hence Px-a.s. 



> 



n 

EE 



"o 1 1,] 



n ^"~f- \ [Rg 

i=l y V a 



R a 



A- \diag (Rg 1 ) 1 Rg 1 - diag (Rg 1 ) 1 fl" 1 



> AB- 



diag (R g ) diag (R g 



Ra — Ra 



with B = inf 



£ > 0. 



The eigenvalues of diag (Rg 1 ) diag 1 ) are bounded between a > and b < oo uniformly 
in n and x. Hence we have, with D\ 7 the diagonal matrix with values Ai, A„, 



D„ 



> AB inf \DxRg 1 - R~} 



a<\i,...,\ n <b 1 



> ABC inf \DiR e - Rg Q \ 2 , using [8] theorem 2.1 

a<Ai,...,A n <6 A 

> ABC , inf, iDxEe-EeJ 2 , 



Al,...,Ar, 



with C = \ inf„ x 



C > 0. Then 



ll^ll 2 ll^floll 2 ' 

1 - 

-De,e > ABC— inf (KRe.ij — Re s,jY 



: J = 1 



ABG \ E¥E (^.u - '•"•- r ' 

77, A 

»=i j=i 

n ( 

ABC ~ E ^ s ( A - !) 2 + E :A// "< • - «9o,* fc 

»=1 [ 
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Lemma Appendix A.l. For any 01, a n and b\, b n € R, 



j(A-l) 2 + ]T (a, - A6,) 2 } > ^^jf 



Proof. 



The minimum in a; of ax 2 — 2&x + c. is 



c. hence 



6?) 

EIU K - M 2 - (ET=i «A) 2 + (E?=i «? ) (Er=i & 2 



> ^~^= 1 ,^ t — r^ - ! using Cauchy-Schwartz inequality. 
1 + Ej=i "i 



□ 



Using lemma Appendix A.l together with (|T|) and lemma Appendix C.l which ensures 
that Ej^i (R9,i,j) — c < +°° uniformly in i, 9 and x, we obtain 



-De,e > ABC — - - ^ E ~ ^ ,i,j 



ABC^— -\Re ~ Re \ 2 i because = 1 = #0 Oiiji , 



which proves (A.4| and ends the proof. 



□ 



Appendix A. 5. Proof of proposition \3. 
Proof. It is shown in [3] that gf-CVg = \y l Mly = ±y* + (M|)*| y. We then show that 
cov (y t Ay,y t By\X) = 2Tr (ARg BRg ) for symmetric matrices A and £?, which shows Q. 
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A straightforward but relatively long calculation then shows 
02 CVe = -A^R^^R^diagiR^y'diaglR^^R^R, 



dOiddj 



V 9 Mi 



y 



' A l/ R e ld ^ R e ld ^( R oT"d^(R e ld ^Re : ) K '// 



+6-y t Rg 1 diag (Rg 1 
-A-ytR^diag (Rg 1 
+2-y t R e 1 diag (Rg 1 
+2-y t R e 1 diag (Rg 1 
+2-y t R e 1 diag (Rg 1 
-2-y*Rg 1 diag (Rg 1 



96» 

-)2 



90i90J 

-2 ,9^ i 

-2 tdR e idRe , 

-2 ! 9 2 fle , 
^ 9^90^ y - 



We then have, using E(y*Ay|X) = Tr(ARe ) and for matrices D, Mi and M2, with Z) 
diagonal, Tr {MiDdiag (M 2 )} = Tr {M 2 Ddiag (Mi)} and Tr(DMi) = Tr (DM'j), 



E 



Wo\x) = -^Tr{^R-diag(R-)^diag(R- o 1 ^R- n 1 )R- o 1 



(A.5) 



(» ' M diag Oln..') "diag (v'^-H* dia„, ( I: 1 I: 1 ) R ,~ 



1, 

n 



'4 n Tr^diag(R,- 1 



+2-Tr<| diagfR, 



- 4 iTr|diag(R- o 1 
-2-Tr/diagfRZ 1 



9(9j 



,_i9R e 
e ° 90i 



^diagfR-^R-^RzMH,/ 



96», 



90j 



-2 i 9R g _i 9R e _i 

8 « 90i 8 « 90j 9 « 

-2 i 9 2 R e i\ 
e ° 90^ 00 J • 



#0 / $o 



The fourth and sixth terms of (A.5 1 are opposite and hence cancel each other. Indeed, 
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T r {diag(R-)- 3 diag (h^H^H^) R- 



-1\~ 3 ( D-l^ O r>-ldROo d-1 



R 



Ra 



-Ra 



Ys( R 0oJ, V""" <W; 'W, 

i=l V * 3 



i = l 

Tr^diag(R,M " R 



(961., 



-i\-2 D -A R -A R ^i 
'° aft e » 9ft e » 



Similarly the fifth and seventh terms of (A. 5) cancel each other. 

Hence, we show the expression of E (^ g e d g g . CVe |X^ of the proposition. 

We use proposition Appendix C.7 to show the existence of Ecyi an d ^cv,2- 

Appendix A . 6. Proof of proposition H? 



□ 



Proof. We use proposition Appendix C.9 with M % the notation of proposition 3.5 and 

Ni = — |m* + (AT)' j , 
together with propositions | Appendix C.7| and |3.5| to show that 

\fc^CV Bo ->A/'(0,E O v,i)- 



We have seen in the proof of proposition 3.5 that there exist matrices Pjj in A4g (proposition 



Appendix C.7l, so that g£gg-CVg = hfPijV, with ±Tr (PyR) ->■ (£cv,2); = Px-almost surely. 



Hence, using proposition Appendix C.8 -ggjLgg converges to T,qv,2 in the mean square sense 
(on the product space). 

Finally, g e gg.gg k CV g can be written as A (^z* N-^ ,k z^J , where N^' k are sums of matrices of 

A4 g (proposition Appendix C.7 1 and z depending on X and Y with (z|A) = JV (0, I n ). The 

singular values of N^ ,k are bounded uniformly in ft n and x and so sup.^ ■ k g {^ gggg.gg k @V§) 

is bounded by b^z l z, b < +oo, and is hence bounded in probability. We apply proposition 
| Appendix C.10| to conclude. 

□ 

Appendix A. 7. Proof of proposition \3. 7| 

Proof. We show the proposition in the case p = 1, the generalization to the case p > 1 being 
the same as in proposition |3.3[ 

Similarly to the proof of proposition 



3.1 



we show that 



We will then show that there exists C > U so that P^-a.s 



CVg„ — E 



g g i ^ * H 



[§fflCVg \ 



X 



^ P o. 



E 



de 2 



cv 8o \x) >c 



dR 6 



m 



(A.6) 



The proof of the proposition will hence be carried out similarly as in the proof of proposition 

ism 

-§02CVg o can be written as z l Mz with z depending on X and Y and (z|X) = Af (0, I n ), and 
M a sum of matrices of Mg (proposition Appendix C.7 1. Hence, using proposition [Appendix 
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C.7 



uniformly in n, sup e 



< a—z z with a < +00. Hence, for fixed n, we can exchange 



derivatives and means conditionally to X and so 



Then, with r^g, R-i^g and y_j the notation of the proof of proposition 3.4 

1 n r f 2 



^E (i-^=l 



1 " 

n ^ ( 



e 



By differentiating twice with respect to 9 and taking the value at 9q we obtain 
d 2 



E 



;CVg \X 



> 



1 n ( 1 ^ ( 



) r i,9 



— (r- 1 r* 



with A = inf 7l)i , ai ^ (R- t .e ), A>0, 



then, using the virtual CV formulas |18l 17]. 



> 



1 - 

7") EE 



=1 fri L 



= A 

= A 

> A 2 B 

> A 2 B inf 



d ^ [ R^K) W)" 1 - ^o 1 ^ 



with £? = inf i 



\i,...,\„ 



> A B inf 
Ai,...,A„ 



Re D\ — 



dRg c 



Then, as if e (0) = 1 for all 0, and hence ^if^ (0) = 0, 



Jx) > A 2 B 2 inf -Y 
1 ~ Ai,...,a„ n ^ 



A 2 + E<U 



OR 



0„ 



1 " 



n * — ' a 

i=l 



A 2 +E U{Re \, 



8 



00 



1,3 



1,3 . 



We then show, similarly to lemma [Appendix A.l| that 

En 2 



A 2 + E K - > : f^f " 

i=1 1 + Z^=i ° 4 



(A.7) 
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Hence, with C £ [1, +00), by using ([TJ and lemma Appendix C.l 



d9 2 



C 



A 2 B 2 
C 



dRg 



i)f) 



V 09 



because ^K 6o(fi ) = 0. 



We then showed ( A. 6 I, which concludes the proof in the case p = 1. 



□ 



Appendix B. Proofs for section [4] 



Appendix B.l. Proof of proposition ^ .2] 

Proof. It is enough to show the proposition for e £ [0, a] for all a < |. We use the following 
lemma. 

Lemma Appendix B.l. Let f n be a sequence of C 2 functions on a segment ofR. We assume 
fn -tunif f, fn -W/ 9, fn "*™./ h - Then, f is C 2 , f = g, and f" = h. 

We denote /„ (e) = iE{Tr(M n )} where (M„)„ g N* is a random matrix sequence defined on 
(Ox, Jx,-Px) which belongs to A4g (proposition Appendix C.7l. We showed that /„ converges 
simply to Eon [0, a]. We firstly use the dominated convergence theorem to show that /„ is C 2 
and that f' n and f^ are of the form 

2<!-Tr(N)}, (B.l) 



with N a sum of random matrix sequences of M. g . Mg is similar to M g (proposition Appendix 
C.7), with the addition of the derivative matrices with respect to e. We can then, using (|6j), 
adapt proposition Appendix C.7 to show that f n and converge simply to some functions g 
and h on [0, a]. 

Finally, still adapting proposition |Appendix C.7[ the singular values of N are bounded 
uniformly in x and n. Hence, using Tr (M) < n||M||, the derivatives of /„, f' n and /" are 
bounded uniformly in n. so that the simple convergence implies the uniform convergence. The 
conditions of lemma [Appendix B . 1 1 are hence fulfilled. 

□ 



Appendix C. Technical results 



In subsection | Appendix C . f] we state several technical results that are used in the proofs of 
the results of sections [3] and [4] Proofs are given in subsection |Appendix C.2| 

Appendix C.l. Statement of the technical results 

Lemma Appendix C.l. Let f : R d -> R+ , so that f (t) < — J-^. Then, for all i £ N* , 

e £ (-|, |) and (x l ) l£N , £ Sf, 



/ . 3 \ d—1 

£ f{v i -v j +e(x i -x j )}<2 d dJ2 \,% +1 ■ 

Lemma Appendix C.2. Let f : R d -> R+ , so that f (t) < . We consider 5 < i . 

Then, for alii £ N*, a > 0, e £ [—6, S] and (xi) iGN , £ , 



j ei 



Y f[a{v l ~v 1 +e{x l ~x j )}]<2 d dY 

^ ■ ' ' V J ' Si ~ ^ 1 + a d +i (j + 1 - 2S) d+1 
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Proof. Similar to the proof of lemma |Appendix C.l[ □ 

Lemma Appendix C.3. Let f : R d -> R+ , so that f (t) < 1+ ^ d +i ■ Then, for all i E N* , 

NeW and ( Xi ) iew E Sf , 

o+ir 1 



f{v i -v j + e{x i -x j )}<2 d d 



1+jcH-l ' 

Proof. Similar to the proof of lemma |Appendix C.l| □ 



Proposition Appendix C.4. Assume that condition \2.1\ is satisfied. 

For all < 5 < \, there exists Cs > so that for all |e| < 5, for all 9 E Q, for all n E N* 
and for all x E (Sx) > the eigenvalues of Rg are larger than C$. 

Lemma Appendix C.5. Assume that condition 00] is satisfied. 

For all |e| < \ and for all K E N. there exists C t ^K so that the eigenvalues of Rg 1 and of 
gg dqRe gg. , < q < K, 1 < i\, ...,i q < p, are bounded by C Ct K, uniformly in n € FJ, x € (Sx) n 
and 6 E 0. 



Proof. Using, proposition Appendix C.4 we control the eigenvalues of R„ 1 uniformly in x and 



With pi) and lemma Appendix C.l and using Gershgorin circle theorem, we control the 
- d"R e ' 1 

□ 



eigenvalues of 



Lemma Appendix C.6. For M symmetric real non-negative matrix, infi 4>i{diag(M)) > 
mii(f>i{M) and swpi 4>i{diag(M)) < sup,0i(M). Furthermore, if for two sequences of symmetric 
matrices M n and N n , M n <~ N n , then diag (M„) <~ diag (N n ). 

Proof. We use Mii = e*Mej, where (ej)j = i... n is the standard basis of M. n . Hence inf, 4>i(M) < 
Mi^ < sup i (f>i(M) for a symmetric real non-negative matrix M. We also use \diag (M)\ < 
\M\. □ 

The next proposition gives a law of large numbers for the matrices that can be written using 
only matrix multiplications, the matrix Rg , the matrices ds 9 gg k Re, the diag operator applied 

to the symmetric products of matrices Rg, Rg 1 and gg t 9 gg k Re, and the matrix diag 



Examples of sums of these matrices are the matrices SmLj £cv,i an d £cv,2 of propositions 3.2 
and |3J>] 



Proposition Appendix C.7. Assume that condition 2.1 is satisfied. 

Let 8 E 0. We denote the set of multi-indexes S p :— U^gjo, 1,2,3} 
(jl, ife) E S p , we denote n ( J) = k. Then, we denote for I E S p U { — 1} 



R i . = f dejJ..,de In(I} R <> if 1 & S p 
[Rg 1 if I = -1 

We then denote 

. M{ d = R\ for L E S nd := (S p U {-1}) 
. Ml^diagiR^Y 1 

. Ml d = diag (i#...izj n(1) ) for I E S bd := U feeN »S£ d 
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We then define Me as the set of sequences of random matrices ( defined on (fix, J~x, Px) ), in- 
dexed by n € N* , dependent onX, which can be written M^ 1 ...M^ with {d\, {dx, Ik} € 

({nd} x S nc i) U ({sd} x {1}) U ({bd} x Sm), and so that, for the matrices M^_, so that dj = bd, 

the matrix Rg 1 ^ 1 ...R s be symmetric. 

Then, for every matrix M^\..M^ of Me, the singular values of ■■■M d * are bounded 

uniformly in 9, n and x G (Sx) n ■ Then, denoting S n := ^Tr ^M^.-.M^V there exists a 

deterministic limit S, which only depends on e, 9 and (d\, I\) (d/f, Ik), so that S n — > S 
Px-cdmost surely. Hence S n — > S in quadratic mean and var (S n ) — > as n — > +oo. 

Proposition Appendix C.8. Assume that condition\2.1\ is satisfied. 



Let M € Me (proposition Appendix C.I). Then, ^y t My converges to £ :— lim„_j. +00 ^-Tr (MRg ), 
in the mean square sense (on the product space). 

Proposition Appendix C.9. Assume that condition \2.1\ is satisfied. 

We recall X ~ C x n and yi = Y(i + eXi), 1 < i < n. We consider symmetric matrix 
sequences M\, ...,M p and Ni,...,N p (defined on (Qx,J~x,Px)), functions of X, so that the 
eigenvalues of N±, N p are bounded uniformly in n and x € (Sx) n , Tr (M; + NjR) = for 
1 < i < p and there exists a p x p matrix £ so that ^Tr(N;RNjR) — > (£)ij Px~oIn%ost 
surely. Then the sequence of p- dimensional random vectors (defined on the product space) 

(^jjl {Tr (Mj) + y*N;y}^ converges in law to a Gaussian random vector with mean zero 

and covariance matrix 2£. 

Proposition Appendix CIO. We recall X ~ C x n and y t = Y (i + eJQ), 1 < i < n. We 
consider a consistent estimator 9 G K p so that P (jfj = 0^ 1, for a function c : — > MP, 
dependent on X and Y , and twice differentiable in 9. We assume that y/nc (9q) J\f (0, Si), for 
apxp matrix £i ant that the matrix converges in probability to apxp positive matrix £2 

(convergences are defined on the product space). Finally we assume that supg i ■ k gg^gg—Ck 
is bounded in probability. 
Then 

sfn (§ - 6» ) -> N (0, E^V^ 1 ) . 



Proposition Appendix C.10 can be proved using standard M-estimator techniques. In 



subsection | Appendix C.2 we give a short proof for consistency. 



Appendix C.2. Proof of the technical results 
Proof of lemma \Appendix C.l\ 

Proof. 

Y f v + e(xi- x )} < ^2 SU P f(v + $v) 

= Y Y sup fi v + s v ). 

jeN t , e {_ i -i,... J +i}d\ { _ J . j}£ i<5„e[-i,i] d 

For v G {-j - 1, j + l} d \{-j,j} d , \v +5^ > j. The cardinality of the set {-j - 1, j 
l} d \{~3,j} d is 

r 2j+3 t / g\ d-l 



Hence 



(2j + 3) d - (2j + l) d = / d.t^dt < 2d (2j + 3) d_1 = 2 d d j 

£/{«<- «i + e(a*-3i)}<£2*d(j + |) rAi+l- 



□ 



2G 



Proof of proposition \Appendix C.J\ 

Proof. Let h : M. d — > K so that h(f) = l|/| a e[— 1,1] ex P (~ ) • 

compact support, so there exists C > so that |ft (t) | < — , C ,d+i ■ 



Then, h is C°° and with 



Hence, from lemma Appendix C.2| there exists < a < oo so that for all i G 

d-l 



h [a {vi — Vj + e (a 



,)}]<C2 d d^ 



(i+ir 



+ (j + 1-25) 

Hence, using Gershgorin circle theorem, for all ti, ...,t n G K, xi, ...,x n G Sx 



T < ^(0). 



^ n n 

-h (0)^2*1 < X] Utjh[a{v i -v j +e(x i -x j )}] 

y tit l f - h ( 

ij^i a ^ Rd W 



)} 



1 



Hence, as (6*, /) — ► -Kg (/) is continuous and positive, using a compacity argument, there exists 
C 2 > so that for all 9 G Q, f G [-a, a] d , if e (/) > C 2 ^ (£) . Hence, 



4f, 



1 ™ 

— X t^-lfe {«i - Uj + e (a 



/)}• 



□ 



Proof of proposition \Appendix C. 7\ 
Proof. Let M£...Mj* gM e be fixed in the proof. 



The eigenvalues of Rg, I G SVkZj are bounded uniformly with respect to n, 9 and x (lemma Ap- 



pendix C.5l. Then, using lemma 



Appendix C.6 



we show that the eigenvalues of diag 1 ) 



are bounded uniformly in x, n and 9. Then, for M^ d = diag (^Rq 1 ...Rg" 11 ^ , the eigenvalues 
of Rj 1 . R„ {n are bounded by the product of the eigenvalues of Rl 1 , ...,R ff " (n . Hence we use 



lemma 



Appendix C.6 to show that the eigenvalues of M£ d are bounded uniformly in n, 9 and 
x. Finally we use HAl—Art]] < ||A 1 ||...||A K || to show that | |M£ —M^ | | is bounded uniformly 
in n, 9 and x. 

We decompose n into n = Nfn2 + r with Ni,n,2,r G N and r < Nf. We define C (vi) as the 
unique v G N d so that v, G rife=i{^i«fc + !, -i-Ni («fc + !)}■ 

We then define the sequence of matrices Rg by I Rg) = (Rg) i lc(ti,)=c(» ■)• We denote 

Mj*...Mj£ the matrix built by replacing i?# by i?g in the expression of M^\..M^ K (we also 
make the substitution for the inverse and the partial derivatives). 



Lemma Appendix C.ll. 

N± , ri2 — > oo . 



m\\..m\ k - Ml\..M\ K 



— > 0, uniformly in x G (Sx) n , when 
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Proof. Let 6 > and N so that T N := C 2 2 2d d 2 EjeN,j>iV-i ^ 6 - Then: 



i,j=l v ^ 



X! ^{^-Uj+e^i-^)}, 
l<i,3<n,C(« 4 )^C(^) 

1 - 

- ^E X/ K%{vi-Vj + e(xi-Xj)}. 

There exists a unique a so that (aNi) d < n < {(a + 1) Ni} d . Among the n observation points, 
(aNi) d are in the C(v), for v € {l,...,a} d . The number of remaining points is less than 
dNi {(a + 1) iVi} d_1 . Therefore, using Q, 



< 



- J" V V X 2 {w i -i; J +e(a; l -a ; ,)} + -d7V 1 {(a + l)7V 1 } d ^o, 



ue{l,...,a} d l<i<n,VieC(w) ieK* 5 C(u 3 )/C(t;i) 



71 ^ ^ ^ ^ 



X! #e {«i - «j + e - + o (1) 



ue{l,...,a} J l^ri^eC^) jeN*,C(u,-)#C(ui) 



Then, for fixed v, the cardinality of the set of the integers i £ {1, so that vi g C (w) and 

there exists j € N* so t hat \v, - Vjl^ < N is Nf - (Ni - 2N) d and is less than 2NdN d ~ 1 . 
Hence, using Q, lemmas Appendix C.l and Appendix C.3| 



Re — Re 



< - (2NdNf- 1 T + N^T N )+o(l) 

v£{l,....a} d 

~ j^ ad {( 2NdN ?~ lT * +N ? TN )} +0 ^- 



This last term is smaller than 25 for Ni and n 2 large enough. Hence we showed 

an show th< 

Ra 1 — Rq 1 



Re - Re 

uniformly in x, when Ni,n 2 — > oo. We can show the same result for a ^ R S )fji and Q ^ R g S ■ 



66x-d6 h 

uniformly in x, when Ni, n 2 — > oo. 
Hence, using [S], theorem 2.1 and lemma Appendix C.6 |Mj — Mj| converges to uniformly 



Finally we use [8] theorem 2.1 to show that 



in x when Ni,n 2 — > oo, for d £ {nd, sd,bd} and / € S^d U {1} U S^. We conclude using [8], 
theorem 2.1. 

□ 

We denote, for every A r 1 ,n 2 and r, with < r < TV-f, n = Nfn 2 + r and S7V1 n 2 '■— 
i-Tr ^M^i ...M^^ , which is a sequence of real random variables defined on (ttx,J~x, Px) an d 



indexed by N±, n 2 and r. Using |8J, corollary 2.1 and lemma Appendix C.ll \S n — Sx ± 



uniformly in x when N\ 1 n 2 — > oo (uniformly in r). As the matrices in the expression of S']v 1 ,n 2 
are block diagonal, we can write SjVi,n 2 — ^ Ym=i ^ l N d +o (n^j > wnere the S l Nd are iid random 
variables defined on (Vlx,J-x,Px) with the distribution of S N d. We denote S N d := Ex (^S N d^j. 
Then, using the strong law of large numbers, for fixed N±, Sn^t^ ~^ S N d Px-almost surely when 
n 2 — > oo (uniformly in r). 

For every -/Vi , , n 2 £ N* , there exist a unique n' 2 , r £ W so that {N\ + px x ) d n 2 = Nfn' 2 +r. 
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Then we have 



\S, 



(N 1 ) d ~ S( Nl+pNi y\ < \S(^ Nl )d - S Nun ' 2 \ + \S N ^ n ' 2 - S N d n>2+r \ 



(C.l) 



+ \Sn?ti' +r S, 



A + B + C 



(-/Vi+pivj) n 2 

D + E. 



I + ^(JVi+pwjVa ~ S Nl+p Nl ,n 2 \ + \SN 1+PNl ,n 2 ~ S^^^y 



Because n' 2 and r depend on N±, px t and n 2l A, B, C, D and E are sequences of random 
variables defined on (fix>-Px> Px) and indexed by N±, pm 1 and n 2 . We have seen that there 
exists fix C fix, with Px(&x) = 1 so that for u>x & fix, when Ni,n 2 — > +00, we also have 
Ni + pni , n' 2 — > +00, and so B and D converge to zero. 

Now, for every N\ G N*, let £lx,N x be so that Px(^X,jv x ) = 1 and for all u)x € fix,iVi> 
SWi,n 2 ^n 2 ^+oo S N d. Let Ox = rijv ie N» &x ,Ni ■ Then Px(^x) = 1 and for all lo x € tlx, for 

all Ni G N*, S Nl ^ n2 — S-„ 2 ^+oo iSjyd. 

We will now show that iVi — > 5jyd is a Cauchy sequence. Let (5 > 0. Px(^ D f2) = 1 so this 



set in non-empty. Let us fix ujx <E O n fi. In (C.l), C is null. There exist Ni and N 2 so that for 
every N\ > N%, n 2 > n 2 , p^ x > 0, B and D are smaller than S. Let us now fix any Ni > N\. 
Then, for every p^ 1 > 0, with n 2 > U2 large enough, A and P are smaller than S. 

Hence, we showed that Ni —> S^ Ni y is a Cauchy sequence and we denote its limit by S. 
Since N\ — > S^ Ni y is deterministic, S is deterministic and S^ N \d — >at 1 - s .+ c S. 

Finally, let n = Nfn 2 + r with JVi. n 2 -¥ 00. Then 

\S n — S\ < \S n — Sn 1 ,u 2 \ + \SN 1: n 2 " S N d \ + \S N d — S\. 



Using the same arguments as before, we show that, Px-a.s., \S n — S\ — > as n — » +00. 



□ 



Proof of proposition \Appendix C.8\ 

Proof. E (iy*My) = E {E (iy*My|X) } = E { ^Tr (MR )} -> E. Furthermore war (^My) = 
E{var (±y*My\ X)} + w {E (^j /Mj/jX) }■ var {^y l My\X = x) is a O (£), uniformly in x, 

Therefore var (ij/*My|X) is 
{^Tr (MRe )} — > 0, using proposition 



using proposition Appendix C.7 and ||A + P|| < 



bounded by 0{ l) Py-a.s. nar | E ( ^y^ylX) } 



Appendix C.7 Hence —y ' My converges to £ in the mean square sense. 



□ 



Proof of proposition \Appendix C.9\ 
Proof. Let v x = (Ai, A p ) G W. 
p 



E exp 



i^A™{Tr(M k )+y t N k y} 



L k=l 



= E 



{«(-p['t^ 



{TrtMO+y'Nky} 



X 



For fixed as = (xi, ...,x n ) G (S x ) n , denoting £Li ^kR^N k R^ = P l DP, with P'P = J„ and 
D diagonal, z = PR~^y (which is a vector of iid standard Gaussian variables, conditionally to 
X = x), we have 



p 1 

^A„{Tr(M k )+y t N k y} = 



1 



fc=i 



Tr A ^ M k E A k R5N k R5 ) z f 

\k=l / i=l \k=l 

n / p \ 

^A fc P5iV fc pi K-l} 
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Hence 



VA fc ^{TV(M u )+y t N k y}|X 



.k=l 



t=i \fc=i 
p p 



5]^A fc A / Tr(RN k RN 1 ) 



fc=i z=i 



n— >+oo 



v A (2S) v\ for a.e. cjx 



Hence, for almost every u>x , we can apply Lindeberg- Feller criterion to the ily-measurable vari- 
ables -^fc (£Li XkR*N k R$} [{z x f t - l}, 1 < * < n, to show that ELi A ^ {Tr(M k ) +y'N k 

converges in law to TV (0, v{ (2S) w A ). Hence, E (cxp i ^Li A fe 77^ { Tr ( M + y*N k y} 



X con- 



verges for almost every uix to exp (— fi> A (2£) «a)- Using the dominated convergence theorem 
on (Sl x , Fx,Px)> E (exp [i£fc=i A fc ^ {Tr (M k ) + y^y}] ) converges to exp {-\v\ (2S) i; A }. 

□ 



Proof of proposition \Appendix G.10\ 

Proof. It is enough to consider the case c (§) = 0, the case P jc (^ = o| — > 1 being deduced 
from it by modifying c on a set with vanishing probability measure, which does not affect the 
convergence in law. For all 1 < k < p 



= 



c k (fl) = c k (9 ) + \^c k (0 O ) I (§ - 9o) + r, 

x\§-6 \ 2 . Hence r = o p (\6- B \ ). Wo 



with random r, so that |r| < supg ij;c 
then have 

-Cfe (0 O ) = 



dBidGj Uk 



d_ 
W 



c k (0o)> +o p (l) 



(0 - 60) 



and so 



00 



c(0 o )+ Op (lH c(0 ) 



(C.2) 



We conclude using Slutsky lemma. 



Remark. One can show that, with probability going to one as n — > +00, the likelihood has a 
unique global minimizer. Indeed, we first notice that the set of the minimizers is a subset of 
any open ball of center 9q with probability going to one. For a small enough open ball, the 
probability that the likelihood function is strictly convex on this open ball converges to one. 
This is because of the third-order regularity of the likelihood with respect to 9, and because the 
limit of the second derivative matrix of the Likelihood at #0 is positive. 

□ 



Appendix D. Exact expressions of the asymptotic variances at e = for d — 1 

In this section we only address the case d = 1 and p = 1, where the observation points 
Vi + eXi, 1 < i < n. n G N*, are the i + eXi, where Xj is uniform on [—1,1], and © = [9i n f, 9 sup \. 

We define the Fourier transform function s (.) of a sequence s n of Z by s (/) = X^nez s n€ 1Sn f 
as in [S]. This function is 2ir periodic on [— tt, it]. Then 

• The sequence of the Kg a (i), i G Z, has Fourier transform / which is even and non-negative 
on [— 7r, it]. 
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• The sequence of the ■ggKg (i), i E Z, has Fourier transform fg which is even on [— 7r,7r]. 

• The sequence of the §^Kg (i) li^Oi i 6 2, has Fourier transform i / t which is odd and 
imaginary on [— tt, tt], 

• The sequence of the §i§gKg (i) lj/Oj i G Z, has Fourier transform i / t .g which is odd and 
imaginary on [— tt, tt], 

• The sequence of the -Ke («) lj^o, i £ Z, has Fourier transform / t t which is even on 

[— 7T,7r]. 

• The sequence of the ^^Kg (i) l^o, « € Z, has Fourier transform /t^e which is even on 

[— 7T,7r]. 

In this section we assume in condition | Appendix D.i] that all these sequences are dominated 
by a decreasing exponential function, so that the Fourier transforms are C°° . This condition 
could be weakened, but it simplifies the proofs, and it is satisfied in our framework. 

Condition Appendix D.l. There exist C < oo and a > so that the sequences of general 
terms Kg Q {i), ^Kg (i), § i Kg (i) l i7 t , mm K 8o (*) ^o, ^Kg (i)l i7 t , ^Kg (i) l^o, 
i G Z, are bounded by Ce _a ' 2 '. 

For a 27r-periodic function / on [— 7T, tt], we denote by M (/) the mean value of / on [—tt, tt]. 
Then, proposition Appendix D.2 gives the closed form expressions of Sml ; ^cv,i, ^cv,2 

and p^ML 

Proposition Appendix D.2. Assume that conditions \2.1\ and \Appendix D.l\ are verified. 
For e = 0, 

'/r 



T,MT ,=M 



+8M I 4 I M ' /( 



fj "' V/ 4 

/ 



- 2 M ( I)- 3 {M(f)M(I)-„(f ^ 
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and 



de 2 



■•ML 




Proposition Appendix D.2 is proved in the supplementary material. 

An interesting remark can be made on Hcv,2- Using Cauchy-Schwartz inequality, we obtain 




-M 



>0, 



so that the limit of the second derivative with respect to 9 of the CV criterion at is indeed 
non-negative. Furthermore, for the limit to be zero, it is necessary that ^ be proportional to 



-j- , that is to say fg be proportional to /. This is equivalent to 



dKe, 
09 



r- 



a being proportional to Kg 



on Z, which happens only when around 9q, Kg (i) = ^-Kg (i). for i 6 Z. Hence around 9q. 9 
would be a global variance hyper-parameter. Therefore, we have shown that for the regular grid 
in dimension one, the asymptotic variance is positive, as long as 9 is not only a global variance 
hyper-parameter . 
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